-
Notifications
You must be signed in to change notification settings - Fork 8.2k
[RFC] Run bsim tests with twister #85610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Run bsim tests with twister #85610
Conversation
|
I'm quite surprised by this PR. Not just because it appeared quite out of the blue for me, but also because there has been quite related work by others (CC @gchwier ) in these area this overlaps with:
We have had this want for very long, but so far we have not succeeded. So instead of doing the same we have done in previous PRs where we discussed some details and ended up spending time polishing something which was rejected on principal issues, I would suggest we have a talk about it between the more involved and relevant people, try to reach an agreement on the principal, and if we do, then look at the particular details. |
87f98dd to
8f1a143
Compare
|
I just added the 4.2 milestone, as freeze for 4.1 is today, and this is neither ready not urgent. So to make it clear this is not trying to get into this release. |
|
@ardo-nordic next time consider opening such PRs as draft until they pass CI to avoid unnecessary noise for reviewers :) |
f747a95 to
dcd499b
Compare
|
|
||
| def _generate_commands(self): | ||
| def rs(): | ||
| return f'-rs={random.randint(0, 2**10 - 1)}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be against having a random seed in tests unless the test ask for it. And even then, it would be better if the random value was printed somehow so that the test can be reproduced easily.
We discussed it with @alwa-nordic and it may be better to use the device ID (same as -d) as the random seed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The harness should not set the random seed on its own, and less to an actual random value.
We discussed it with @alwa-nordic and it may be better to use the device ID (same as -d) as the random see
In the executable, the random seed already defaults to a value that depends on the device number. There is no need to do that again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the executable, the random seed already defaults to a value that depends on the device number. There is no need to do that again.
Okay, yeah I was pretty sure there was already something like that implemented but I wanted to test it today to confirm it ^^
| for file in [f for f in files if os.path.getctime(f) > self._start_time]: | ||
| os.remove(file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for file in [f for f in files if os.path.getctime(f) > self._start_time]: | |
| os.remove(file) | |
| for file in files: | |
| if os.path.getctime(file) > self._start_time: | |
| os.remove(file) |
I think it's more readable like that.
But I am wondering why you are looking at the change time? What is the reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the files that were created by the test. Using generators with conditions is a standard practice in Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you are right list comprehension are common practice, mixing the operations (filtering and the iteration) isn't a good practice.
You could do something like that if you want to keep the list comprehension:
| for file in [f for f in files if os.path.getctime(f) > self._start_time]: | |
| os.remove(file) | |
| recent_files = [f for f in files if os.path.getctime(f) > self._start_time] | |
| for file in recent_files: | |
| os.remove(file) |
|
|
||
| def expand_args(dev_id): | ||
| try: | ||
| args = [str(eval(v)[dev_id]) if v.startswith('[') else v for v in extra_args] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit hard for me to understand why you are using eval() here. But this is something you should avoid if possible. It may execute arbitrary code.
@alwa-nordic is suggesting to use the str.format method instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have misunderstood what this does when I gave my suggestion. I assumed it's some kind of variable interpolation. Now I see it's for defining separate bsim_options per device, right?
I suggest we transpose bsim_exe_name, bsim_test_ids, and bsim_options in the config, so it looks like a script. What do you think? Example below:
devices:
- exe: foo.exe
bsim_testid: central
custom_args:
- -flash=bar.nvm.bin
- -argstest
- 10
- exe: foo.exe
bsim_testid: peripheral
custom_args:
- -flash=baz.nvm.bin
- -argstest
- 10| self._start_time = time.time() | ||
|
|
||
| def _get_exe_path(self, index): | ||
| return self._exe_paths[index if index < len(self._exe_paths) else 0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if returning the first element when the index is out of range is safe to do. Is it necessary? Seems like a bad practice that is hiding bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Serves the purpose of handling the case (quite common one) when the same executable runs on all simulated devices.
|
|
||
| def wait_bsim_ready(self): | ||
| start_time = time.time() | ||
| while time.time() - start_time < Bsim.BSIM_READY_TIMEOUT_S: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nicer to have something smarter than this. As an early stopgap it may be ok though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this code do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's here to ensure that executable file needed for the test is compiled and ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So twister does not necessarily complete all build jobs before running? Won't this potentially suffer from a deadlock, e.g. when -j1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you for example run a test suite with one build test and a few run-only tests, and have 8 threads for this, run-only tests need this wait mechanism because they'll be attempted to launch at the same time as build test. With one thread, there will be no waiting in this function because when it's going to be called all binaries will be available.
|
|
||
| def _generate_commands(self): | ||
| def rs(): | ||
| return f'-rs={random.randint(0, 2**10 - 1)}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no way we can have this.
a) we do not want actual randomness.
b) we have known issues that are triggered or not depending on the random seed, so this would mean CI would start failing at random in some runs. This is not something we can have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove it.
|
|
||
| def _generate_commands(self): | ||
| def rs(): | ||
| return f'-rs={random.randint(0, 2**10 - 1)}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The harness should not set the random seed on its own, and less to an actual random value.
We discussed it with @alwa-nordic and it may be better to use the device ID (same as -d) as the random see
In the executable, the random seed already defaults to a value that depends on the device number. There is no need to do that again.
| bsim_options: | ||
| - -RealEncryption=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the - -?
Could RealEncryption have been it's own entry in yaml like real_encryption: true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
real_encryption: true
I would really not go that way. Which options you can pass to an executable is an unbound and unknown set. They depend on the test, configuration and target, and are not the problem of the twister harness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's a pretty long list of arguments which will easily double the size of the schema if we add all of them.
| - -argstest | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doens't -argstest describe that the next arguments are test arguments? Where are the arguments here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the common options are prepended to bsim_options below. I also think this is awkward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Common section contains common options, allows to shorten the list when there is a lot of duplication so that it's possible to clearly see only the options which are different. However, that's not mandatory to use.
| bluetooth.host.adv.extended.build_scanner: | ||
| build_only: true | ||
| harness_config: | ||
| bsim_exe_name: tests_bsim_bluetooth_host_adv_extended_prj_scanner_conf | ||
| bsim_exe_name: | ||
| - tests_bsim_bluetooth_host_adv_extended_prj_scanner_conf | ||
| extra_args: | ||
| CONF_FILE=prj_scanner.conf | ||
| bluetooth.host.adv.extended.ext_adv: | ||
| no_build: true | ||
| harness_config: | ||
| bsim_exe_name: | ||
| - tests_bsim_bluetooth_host_adv_extended_prj_advertiser_conf | ||
| - tests_bsim_bluetooth_host_adv_extended_prj_scanner_conf | ||
| bsim_test_ids: | ||
| - ext_adv_advertiser | ||
| - ext_adv_scanner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's interesting that you have a test here that is a build-only and one that is a no-build.
What's preventing this from just being a test that is being built and run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each test here requires two different executables, one for scanner and one for advertising. Moving them to a separate build only test allows to build them only once and use the executables in subsequent tests without rebuilding.
| bluetooth.host.id.settings_dut2: | ||
| no_build: true | ||
| harness_config: | ||
| bsim_exe_name: tests_bsim_bluetooth_host_id_settings_prj_conf | ||
| bsim_test_ids: | ||
| - dut2 | ||
| bsim_options: | ||
| - -flash_rm | ||
| bluetooth.host.id.settings_dut1: | ||
| no_build: true | ||
| harness_config: | ||
| bsim_test_ids: | ||
| - dut1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a dependency between these tests that's not expressed here. The dut1 test leaves behind a file that is needed by the dut2 test.
I discussed this with @theob-pro. The way of least resistance is to make harness_config be a list of simulations that share simulation id and working directory and execute in sequence.
bluetooth.host.id.settings:
no_build: true
harness_config:
- bsim_test_ids:
- dut1
- bsim_test_ids:
- dut2
bsim_options:
- -flash_rm
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting one. That really goes against the grain of testing when you make a test that relies on the result of a previous test. Ideally, it shouldn't be. So best solution is to redesign this test in my opinion, and not do anything like that again. As it stands now, the test will pass if it's run as a bunch (pipeline queue is lifo, hence reverse order here in testcase.yaml). What I really don't want to do, is to add this very special case to the framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can say from the mesh perspective. The problem is that there is currently (unless I'm mistaken) no way to restart a bsim executable inside the test (basically the executable itself). For the test cases where we test that mesh correctly restores data from a persistent storage, we need to simulate a device reboot. And this is achieved by running several tests in a row using same simulation id:
zephyr/tests/bsim/bluetooth/mesh/tests_scripts/persistence/provisioning.sh
Lines 11 to 23 in 91cf42c
| # SKIP=(persistence_provisioning_data_save) | |
| overlay=overlay_pst_conf | |
| RunTestFlash mesh_pst_prov_data_check persistence_provisioning_data_save -flash_erase | |
| # SKIP=(persistence_provisioning_data_load) | |
| overlay=overlay_pst_conf | |
| RunTestFlash mesh_pst_prov_data_check persistence_provisioning_data_load -flash_rm | |
| overlay="overlay_pst_conf_overlay_ss_conf_overlay_psa_conf" | |
| RunTestFlash mesh_pst_prov_data_check_psa persistence_provisioning_data_save -flash_erase | |
| overlay="overlay_pst_conf_overlay_ss_conf_overlay_psa_conf" | |
| RunTestFlash mesh_pst_prov_data_check_psa persistence_provisioning_data_load -flash_rm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That really goes against the grain of testing when you make a test that relies on the result of a previous test. Ideally, it shouldn't be. So best solution is to redesign this test in my opinion, and not do anything like that again
I think there is a bit of a misunderstanding here. These tests where something is run generating flash content, and then something else is run which uses that flash content, are not different tests. They are just sub-steps of the same test.
Meaning, many tests consists of just running a set of devices where all connect to one phy. But that is just the typical way, not the only way.
The tests @alwa-nordic refers to, run 2 or 3 sets of devices with phys in sequence (the whole sequence is one test). But we also have tests in tree running with the EDTT; and tests where only some of the devices executables connect to the Phy.
And that is in tree. Out of tree, we have tests that run postprocessing scripts on the simulation results and other programs to check what has happened.
One option would be to say that, by now, this twister harness does not support running those types of tests, but overall we need to keep supporting them.
| files = glob.glob(os.path.join(self._bsim_out_path, '*.log')) | ||
| # files += glob.glob(os.path.join(self._bsim_out_path, '*.bin')) | ||
| try: | ||
| for file in [f for f in files if os.path.getctime(f) > self._start_time]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like that we are using ctime here. Won't this pick up files from parallel runs? Is there no better way to select files for cleaning?
|
|
||
| for t in threads: | ||
| t.join(timeout=timeout) | ||
| except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use a naked
except Exception:clause, you might end up catching exceptions other than the ones you expect to catch. This can hide bugs or make it harder to debug programs when unrelated errors are hidden.
https://pylint.readthedocs.io/en/v3.3.3/user_guide/messages/warning/broad-exception-caught.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understandable, but the purpose here is not to crash the whole session regardless of what happens, and use the status to just fail the test, and allow the session of move over to the next one.
| [os.path.join(self._bsim_out_path, exe_name) for exe_name in exe_names] | ||
|
|
||
| def clean_exes(self): | ||
| self._set_start_time() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does clean_exes call _set_start_time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's essentially the first actual harness function that is executed, at least for tests that are compiling.
| self._set_start_time() | ||
|
|
||
| try: | ||
| for exe_path in [self._get_exe_path(i) for i in range(len(self._exe_paths))]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for exe_path in [self._get_exe_path(i) for i in range(len(self._exe_paths))]: | |
| for exe_path in self._exe_paths: |
| "bsim_test_ids": | ||
| type: seq | ||
| required: false | ||
| sequence: | ||
| - type: str | ||
| "bsim_verbosity": | ||
| type: int | ||
| required: false | ||
| "bsim_sim_length": | ||
| type: float | ||
| required: false | ||
| "bsim_options": | ||
| type: seq | ||
| required: false | ||
| sequence: | ||
| - type: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unfortunate we cannot use the schema required field because the real type of harness_config is dependent on harness. What do you think about creating a bsim container field under harness_config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very fond of it as well, but that's the current design, and I don't want to bloat the scope further.
| if new_exe_name: | ||
| new_exe_name = f'bs_{self.instance.platform.name}_{new_exe_name}' | ||
| else: | ||
| new_exe_name = f'bs_{self.instance.name}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between self.instance.name and self.instance.platform.name?
| if self._exe_path: | ||
| return self._exe_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The standard library has functools.cached_property that could be used here. I'll leave it up to you if you want to use it.
Bsim harness now can also be used to run tests. Signed-off-by: Artur Dobrynin <[email protected]>
BSIM tests might be reusing the same binary files, so it makes sense to be able to split tests into build-only, which will only build and copy executable files, and run only, which will run them whenever needed. This requires more sequential pipeline to avoid race conitions. Signed-off-by: Artur Dobrynin <[email protected]>
Adding functionality to split harness_config into common and test-case specific part, which are joined together for any given test. Signed-off-by: Artur Dobrynin <[email protected]>
Adding Bsim phy-specific options to harness config. Clean up leftover log files. Adding option to expand extra args with individual values for different simulated devices. Adding random seed value to every sim device. Signed-off-by: Artur Dobrynin <[email protected]>
Changing bsim_exe_name values to a list due to respective schema changes. Signed-off-by: Artur Dobrynin <[email protected]>
Adding necessary changes to testcase.yaml files in all Bsim host tests to make them directly runnable by twister's bsim harness. Signed-off-by: Artur Dobrynin <[email protected]>
Shell scripts are no longer needed since twister can be used to build and run host bsim tests. Signed-off-by: Artur Dobrynin <[email protected]>
Increasing test timeouts for better margins. Removing missed obsolete shell running files. Signed-off-by: Artur Dobrynin <[email protected]>
Chaning timeouts in testcase.yaml files to be the same as they were in the shell running scripts. Signed-off-by: Artur Dobrynin <[email protected]>
Fixing minor compliance errors and issues popping up after rebase. Fixing unit tests. Signed-off-by: Artur Dobrynin <[email protected]>
With added support of running bsim tests with Twister, separate building command is no longer need as tests can be build and run with single command. Signed-off-by: Artur Dobrynin <[email protected]>
Changed bsim test options in a way that allows explicit configuration of simulated devices, including names (test ids), options and exe names, while still retaining common options and exe name. Removed random seed generation for devices. Minor review updates. Signed-off-by: Artur Dobrynin <[email protected]>
0dd6198 to
a2b3205
Compare
Bsim options schema was updated with explicit individual device configuration, thus updating all test config files in host. Signed-off-by: Artur Dobrynin <[email protected]>
a2b3205 to
03cc943
Compare
Updating bsim harness documentation after adding functionality to run BabbleSim tests. Signed-off-by: Artur Dobrynin <[email protected]>
New bsim test that was added upstream needed to be converted to be run with Bsim harness. Signed-off-by: Artur Dobrynin <[email protected]>
| exe_names[exe_name] = replacer(f'bs_{self.instance.platform.name}_{exe_name}') | ||
|
|
||
| if not exe_names: | ||
| exe_names[None] = f'bs_{replacer(self.instance.name)}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the goal of doing that? Why None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be an empty string, doesn't matter. That's handling of a very common case when there is one binary file listed as 'bsim_exe_name' and doesn't have individual exes for bsim devices.
|
This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. |
Bsim harness in twister has been modified to be able to run tests. The main changes it entailed are as following:
Atm, only host is extensively using bsim harness, and thus is the only one affected.